R 語言的特性

  • 相較同類型的商業授權程式語言(Matlab, SAS, SPSS, Stata):Open source
  • 相較同類型的開源程式語言(Julia):社群比較大
  • 函數型編程

核心精神: 對物件應用函數。

建立 R 語言的開發環境

  • 開發環境包含兩個元件:
  1. 運算的核心(直譯器)
  2. 整合開發環境(Integrated Development Environment: IDE)

運算的核心:R

整合開發環境:RStudio

未來從 RStudio 開啟就可以

  • Windows 使用者可以從開始功能表找到
  • MacOS 使用者可以從應用程式找到

RStudio 介面

  • Script: 編寫程式碼
  • Console: 跟運算核心互動
  • Environment: 列出目前使用者宣告的物件函數
  • 多功能區塊:
  • Files 掌控讀檔跟寫檔的路徑
  • Plots 視覺化的產出預覽
  • Packages 套件管理器
  • Help 查詢文件
  • 按 Alt + Shift + K 可以看到 RStudio 的快捷鍵清單

快速入門

使用 <- 宣告物件或函數

  • 並不是說 = 不對,但是約定俗成使用 <-
  • 在 RStudio 中可以使用快捷鍵 Alt 與 - 叫出 <-
# 宣告物件
my_favorite_star <- "Tom Cruise"
my_lucky_number <- 24
r_is_easy <- TRUE

# 宣告函數
say_hello <- function(){
  return("Hello R!")
}

印出物件或呼叫函數

# 印出物件
my_favorite_star
my_lucky_number
r_is_easy

# 呼叫函數
say_hello()
## [1] "Tom Cruise"
## [1] 24
## [1] TRUE
## [1] "Hello R!"

使用 rm() 刪除物件

rm(r_is_easy)
r_is_easy # Error

使用 # 做註解

  • 只有單行註解
# 這是 R 語言與資料科學應用
# 這是系統訓練班
# 教室在台大資工系館

R Console 的快捷鍵

  • 使用 Ctrl + L 清空
  • 按向上、向下的箭頭查詢執行過的程式

R Console 一直出現 +

  • Console 出現 + 的原因是因為 R 還在期待我們未完成的輸入,例如:
my_favorite_player <- "Steve Nash # 少了一個右邊雙引號

say_hello <- function(){
  return("Hello R!")
# 少了一個右邊的大括號

help(print # 少了一個右邊的小括號 
  • 排除的方法有兩種:
  1. 完成輸入
  2. 砍掉重練,按 ESC 然後重新輸入程式

R 語言的套件

  • R 語言或者其他現代程式語言,都大量仰賴套件補強基礎功能的不足
  • 使用 R 語言套件有兩個層面:
  • 安裝 install.pacakges()
  • 載入 library()
  • 安裝做一次就好,載入則是每次要用都得載入

常用函數:help()?

  • 查詢函數或者資料
help(print) # ?print
help(cars) # ?cars

常用函數:sessionInfo()

  • 回傳 R 的相關資訊
sessionInfo()
## R version 3.4.4 (2018-03-15)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.4.4  backports_1.1.2 magrittr_1.5    rprojroot_1.3-2
##  [5] tools_3.4.4     htmltools_0.3.6 yaml_2.1.19     Rcpp_0.12.16   
##  [9] stringi_1.2.4   rmarkdown_1.9   knitr_1.20      stringr_1.3.1  
## [13] digest_0.6.15   evaluate_0.11

常用函數:Sys.getlocale()

  • 獲得電腦的語系設定
Sys.getlocale()
## [1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"

常用函數:getwd()

  • 回傳目前的工作路徑 Get Working Directory
getwd()
## [1] "/Users/kuoyaojen/r_programming"

常用函數:setwd()

  • 指定工作路徑 Set Working Directory
  • 特別注意!在程式語言的檔案路徑指派,永遠使用正向的斜線(/)
  • 因為反向的斜線()在程式語言中另有用途:跳脫、Unicode
  • 檢查電腦的使用者名稱 USERNAME,盡量避免使用中文使用者名稱
setwd("/Users/USERNAME/Desktop") # MacOS 的桌面
setwd("C:/Users/USERNAME/Desktop") # Windows 的桌面

常用函數:q()

  • 離開 R 語言(RStudio)Quit
  • 盡量選擇不要儲存 Workspace Image
q()

常見的程式語言學習地圖

  • 安裝編譯器(C, C++, Java,…)或直譯器(R,Python,JavaScript,…)
  • 選一個友善的整合開發環境(IDE)
  • 快速入門
  • 變數型別(純量)
  • 流程控制(布林、邏輯值)
  • 函數
  • 資料結構(陣列、向量、字典、清單…)
  • 迭代迴圈(for、while)
  • (選修)類別
  • (選修)套件、模組

變數型別

  • class() 函數可以回答變數的型別為何
  • 數值
  • 浮點數(numeric)
  • (選修)整數(integer)
  • (選修)複數(complex)
  • 文字(character)
  • 邏輯值(logical)
  • 日期(Date)
  • 日期時間(POSIXct)

數值

  • 使用 class() 函數查看浮點數的型別
# help(class)
# ?class
class(24) # numeric
my_lucky_number <- 24
class(my_lucky_number) # numeric
class(2.4) # numeric
class(-8.7) # numeric
class(0) # numeric
## [1] "numeric"
## [1] "numeric"
## [1] "numeric"
## [1] "numeric"
## [1] "numeric"

選修的數字型別

  • 整數(integer)
  • 複數(complex)
my_lucky_integer <- 87
class(my_lucky_integer) # numeric
my_lucky_integer <- 87L
class(my_lucky_integer) # integer
my_lucky_complex <- 8 + 7i
class(my_lucky_complex) # complex
my_lucky_integer
my_lucky_complex
## [1] "numeric"
## [1] "integer"
## [1] "complex"
## [1] 87
## [1] 8+7i

數字的運算

  • 加減乘除:+, -, *, /
  • 次方:^, **
  • 餘數:%%
  • 商數:%/%
  • 符合先乘除後加減,用小括號可以調整運算的優先順序
  • 中括號、大括號在 R 裡頭另有用途,這裡只有小括號可以用
5**2 # 25
5^3  # 125
9**(1/2) # 3
27**(1/3) # 3
8 %% 6 # 2
8 %/% 6 # 1
9**1/2 # 4.5 運算優先順序
27**1/3 # 9 運算優先順序
## [1] 25
## [1] 125
## [1] 3
## [1] 3
## [1] 2
## [1] 1
## [1] 4.5
## [1] 9

數字運算的練習

\[BMI = \frac{weight_{kg}}{height_{m}^2}\]

  • 請計算林書豪的身體質量指數,他的身高 191 公分、體重 91 公斤。
# 直接算
jeremy_bmi <- 91/(1.91**2)
jeremy_bmi

# 賦值之後再算
jeremy_height <- 191/100
jeremy_weight <- 91
jeremy_bmi <- jeremy_weight / jeremy_height**2
jeremy_bmi
## [1] 24.94449
## [1] 24.94449

計算任何一個人的身體質量指數,他的身高 height 公分、體重 weight 公斤。

# 函數外觀
FUNCTION_NAME <- function(x, arg_1, arg_2, ...) {
  # 使用 x, arg_1, arg_2 來產出結果
  return(結果)
}
get_bmi <- function(height, weight) {
  height_m <- height * 0.01
  bmi <- weight / height_m**2
  return(bmi)
}
steveNash_bmi <- get_bmi(191, 82)
jeremyLin_bmi <- get_bmi(191, 91)
shaq_bmi <- get_bmi(216, 148)
steveNash_bmi
jeremyLin_bmi
shaq_bmi
## [1] 22.47745
## [1] 24.94449
## [1] 31.72154

(選修)運算混合整數與複數

class(8 + 7) # numeric
class(8L + 7L) # integer
class(8L + 7) # numeric
class(7 + 8i + 7L) # complex
class(7 + 8i + 7) # complex
## [1] "numeric"
## [1] "integer"
## [1] "numeric"
## [1] "complex"
## [1] "complex"

文字

  • 用單引號或雙引號來標註文字
class('Tom Cruise') # character
class("Tom Cruise") # character
my_favorite_star <- 'Tom Cruise'
class(my_favorite_star) # character
my_favorite_star <- "Tom Cruise"
class(my_favorite_star) # character
## [1] "character"
## [1] "character"
## [1] "character"
## [1] "character"
  • 什麼時候使用單引號或雙引號有差別?
my_fav_player <- 'Shaquille O'Neal' # 
# Solution 1
my_fav_player <- "Shaquille O'Neal" # Correct
my_fav_player
# Solution 2
my_fav_player <- 'Shaquille O\'Neal' # Correct
my_fav_player

文字的賦值練習

ross_script <- "Let's put aside the fact that you \"accidentally\" pick up my grand mother's ring."
ross_script <- 'Let\'s put aside the fact that you "accidentally" pick up my grand mother\'s ring.'
  • sprintf() 函數: 實踐 string print with format 輸出文字的技巧
  • format:
  • %s 文字
  • %.2f 兩位的浮點數
  • %d 整數
get_bmi <- function(height, weight) {
  height_m <- height * 0.01
  bmi <- weight / height_m**2
  return(bmi)
}

print_bmi <- function(player_name, player_bmi) {
  bmi_fmt <- sprintf("%s 的 BMI 為:%.2f", player_name, player_bmi)
  return(bmi_fmt)
}

steveNash_bmi <- get_bmi(191, 82)
jeremyLin_bmi <- get_bmi(191, 91)
shaq_bmi <- get_bmi(216, 148)
print_bmi("Steve Nash", steveNash_bmi)
print_bmi("Jeremy Lin", jeremyLin_bmi)
print_bmi("Shaquille O'Neal", shaq_bmi)
## [1] "Steve Nash 的 BMI 為:22.48"
## [1] "Jeremy Lin 的 BMI 為:24.94"
## [1] "Shaquille O'Neal 的 BMI 為:31.72"

邏輯值

  • TRUE 要全部都大寫
  • FALSE 要全部都大寫
class(TRUE)  # logical 
class(FALSE) # logical
class(True) # Error
class(false) # Error
r_is_awesome <- TRUE
class(r_is_awesome) # logical
r_is_commercial_licensed <- FALSE
class(r_is_commercial_licensed) # logical

產生邏輯判斷的運算符號

  • 大於 >、大於等於 >=
  • 小於 <、小於等於 <=
  • 等於 ==(注意!= 已經保留給賦值這個功能)
  • 不等於 !=
  • ! 可以反轉 TRUE/FALSE
8 > 7 # TRUE
8 <= 7 # FALSE
8 == 7 # FALSE
8 != 7 # TRUE
!FALSE # TRUE
!TRUE # FALSE
## [1] TRUE
## [1] FALSE
## [1] FALSE
## [1] TRUE
## [1] TRUE
## [1] FALSE
  • 英文字母大小順序 Z > z > Y > y > X > x... >B > b > A > a
"Z" > "z" # TRUE
"z" > "Y" # TRUE
"Y" > "y" # TRUE
# ...
"b" > "A" # TRUE
"A" > "a" # TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE

邏輯值應用場景

  • 流程控制 if-else if-else
  • 資料篩選(資料科學應用)
# 流程控制
if (CONDITION_1) {
  # 在 CONDITION_1 成立的狀態下做什麼事
} else if (CONDITION_2) {
  # 在 CONDITION_2 成立的狀態下做什麼事
} else if (CONDITION_3) {
  # 在 CONDITION_3 成立的狀態下做什麼事
} else {
  # 在 CONDITION_1, CONDITION_2 以及 CONDITION_3 都不成立的狀態下做什麼事
}

流程控制

  • 假如計算出來的 BMI 超過 30,回傳“過重”
shaq_bmi <- get_bmi(216, 150)
if (shaq_bmi > 30) {
  print("過重")
}
## [1] "過重"
  • Steve Nash 的 BMI 小於等於 30
# 沒有輸出
steve_bmi <- get_bmi(191, 82)
if (steve_bmi > 30) {
  print("過重")
}
# 二元分支
steve_bmi <- get_bmi(191, 82)
if (steve_bmi > 30) {
  print("過重")
} else {
  print("沒有過重")
}
## [1] "沒有過重"
# 三元分支
steve_bmi <- get_bmi(191, 82)
if (steve_bmi > 30) {
  print("過重")
} else if (steve_bmi < 18.5) {
  print("過輕")
} else {
  print("正常")
}
## [1] "正常"

函數作用的環境

  • global
  • local
  • 在 global 定義的函數與物件可以在 local 使用
  • 在 local 定義的函數與物件不可以在 global 使用
# 在 global 宣告的物件可以在 local 使用
jeremy_height <- 191 # global
jeremy_weight <- 91 # global

get_jeremy_bmi <- function() {
  height_m <- jeremy_height * 0.01
  bmi <- jeremy_weight / height_m**2
  return(bmi)
}

get_jeremy_bmi()
## [1] 24.94449
# 在 local 宣告的物件不可以在 global 使用
get_steve_bmi <- function() {
  steve_height <- 191 # local
  steve_weight <- 82 # local
  bmi <- steve_weight / (steve_height/100)**2
  return(bmi)
}
get_steve_bmi()
steve_height # Error: object 'steve_height' not found
steve_weight # Error: object 'steve_weight' not found

練習:四元分支

# 四元分支
get_bmi_label <- function(height, weight, player_name) {
  player_bmi <- get_bmi(height, weight) # function in global env
  if (player_bmi < 18.5) {
    bmi_label <- "Underweight"
  } else if (player_bmi > 30) {
    bmi_label <- "Obese"
  } else if (player_bmi >= 18.5 & player_bmi < 25) {
    bmi_label <- "Normal"
  } else {
    bmi_label <- "Overweight"
  }
  return(sprintf("%s 的 BMI 範圍是 %s", player_name, bmi_label))
}
get_bmi_label(191, 82, "Steve Nash")
## [1] "Steve Nash 的 BMI 範圍是 Normal"
get_bmi_label(216, 148, "Shaquille O'Neal")
## [1] "Steve Nash 的 BMI 範圍是 Normal"
## [1] "Shaquille O'Neal 的 BMI 範圍是 Obese"

邏輯值可以當作數值來使用

  • TRUE: 1
  • FALSE: 0
TRUE + TRUE
FALSE + FALSE
TRUE + FALSE
## [1] 2
## [1] 0
## [1] 1

連結邏輯判斷的符號

  • & 交集(shift + 7)AND
  • 只有兩個 TRUE 交集,結果才會是 TRUE
  • | 聯集(shift + )OR
  • 只有兩個 FALSE 聯集,結果才會是 FALSE
# 判斷過重與否:BMI > 30 而且 body_fat > 0.25
shaq_bmi # 32.15021
shaq_body_fat <- 0.10
shaq_bmi > 30 # TRUE
shaq_body_fat > 0.25 # FALSE
is_shaq_overweight <- (shaq_bmi > 30) & (shaq_body_fat > 0.25)
is_shaq_overweight # FALSE
## [1] 32.15021
## [1] TRUE
## [1] FALSE
## [1] FALSE
# 判斷過重與否:BMI > 30 或者 body_fat > 0.25
is_shaq_overweight <- (shaq_bmi > 30) | (shaq_body_fat > 0.25)
is_shaq_overweight # TRUE
nash_bmi <- get_bmi(191, 82)
nash_body_fat <- 0.08
is_nash_overweight <- (nash_bmi > 30) | (nash_body_fat > 0.25)
is_nash_overweight
## [1] TRUE
## [1] FALSE
# 交集的判斷
TRUE & TRUE # TRUE
TRUE & FALSE # FALSE
FALSE & TRUE # FALSE
FALSE & FALSE # FALSE
## [1] TRUE
## [1] FALSE
## [1] FALSE
## [1] FALSE
# 聯集的判斷
TRUE | TRUE # TRUE
TRUE | FALSE # TRUE
FALSE | TRUE # TRUE
FALSE | FALSE # FALSE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] FALSE

判斷整數是否為偶數(二元分支)

my_integer <- 7L
if (my_integer %% 2 == 0) {
  print("是偶數")
} else {
  print("是奇數")
}
# print("是偶數")
# print("是奇數")
## [1] "是奇數"

流程控制的精神

  • M.E.C.E.
  • Mutually Exclusive, Collectively Exhaustive.
# fizz-buzz
fizz_buzz <- function(x) {
  # 假如 x 是 3 的倍數: 回傳 fizz
  # 假如 x 是 5 的倍數: 回傳 buzz
  # 假如 x 是 15 的倍數: 回傳 fizz buzz
  # 否則回傳 x
  if (x %% 15 == 0) {
    return("fizz buzz")
  } else if (x %% 5 == 0) {
    return("buzz")
  } else if (x %% 3 == 0) {
    return("fizz")
  } else {
    return(x)
  }
  # 判斷完之後在這裡!
}
fizz_buzz(6) # fizz
fizz_buzz(25) # buzz
fizz_buzz(45) # fizz-buzz
fizz_buzz(46) # 46
## [1] "fizz"
## [1] "buzz"
## [1] "fizz buzz"
## [1] 46

型別的判斷

  • is.numeric() 判斷是否為浮點數
  • is.character() 判斷是否為文字
  • is.logical() 判斷是否為邏輯值
is.numeric(87) # TRUE
is.numeric("87") # FALSE
is.numeric(TRUE) # FALSE
is.character("87") # TRUE
is.character(87) # FALSE
is.character(FALSE) # FALSE
is.logical(FALSE) # TRUE
is.logical("TRUE") # FALSE
is.logical(0) # FALSE
## [1] TRUE
## [1] FALSE
## [1] FALSE
## [1] TRUE
## [1] FALSE
## [1] FALSE
## [1] TRUE
## [1] FALSE
## [1] FALSE
my_lucky_number <- "24"
if (class(my_lucky_number) == "numeric") {
  sprintf("%d 是 %s 型別", my_lucky_number, class(my_lucky_number))
} else {
  sprintf("%s 不是 numeric 型別", my_lucky_number)
}
## [1] "24 不是 numeric 型別"
my_lucky_number <- "24"
if (is.numeric(my_lucky_number)) {
  sprintf("%d 是 %s 型別", my_lucky_number, class(my_lucky_number))
} else {
  sprintf("%s 不是 numeric 型別", my_lucky_number)
}
## [1] "24 不是 numeric 型別"

型別的轉換

  • as.numeric() 轉換為浮點數
  • 邏輯值可以換為 1(TRUE)/0(FALSE)
  • 文字會被換成 NA
  • as.character() 轉換為文字
  • 暢行無阻
  • as.logical() 轉換為邏輯值
  • 數字除了 0 會換為 FALSE,其餘都是 TRUE
  • 一般文字會被換成 NA
  • “TRUE”、“True”、“true”、“FALSE”、“False”、“false” 都會換成功
# as.numeric()
r_is_awesome <- TRUE # logical
as.numeric(r_is_awesome) # 1
as.character(r_is_awesome) # "TRUE"
class(r_is_awesome) # logical
r_is_awesome <- as.numeric(r_is_awesome) # Assignment
class(r_is_awesome)
as.numeric("Luke Skywalker") # NA - Not Available 遺漏值
## Warning: NAs introduced by coercion
## [1] 1
## [1] "TRUE"
## [1] "logical"
## [1] "numeric"
## [1] NA
# as.character()
as.character(8.7)
as.character(TRUE)
## [1] "8.7"
## [1] "TRUE"
# as.logical()
as.logical(1) # TRUE
as.logical(0) # FALSE
as.logical(2) # TRUE
as.logical(-1) # TRUE
as.logical("Luke Skywalker") # NA
as.logical("TRUE") # TRUE
as.logical("FALSE") # FALSE
as.logical("1") # NA
as.logical("0") # NA
as.logical("True") # TRUE
as.logical("true") # TRUE
as.logical("False") # FALSE
as.logical("false") # FALSE
## [1] TRUE
## [1] FALSE
## [1] TRUE
## [1] TRUE
## [1] NA
## [1] TRUE
## [1] FALSE
## [1] NA
## [1] NA
## [1] TRUE
## [1] TRUE
## [1] FALSE
## [1] FALSE

日期

  • Sys.Date() 函數可以查看今天的日期
  • as.Date() 函數可以將文字轉換為日期
  • 日期型別是可以運算的
  • strptime() 可以暸解文字對應日期的格式
sys_date <- Sys.Date() # case sensitive
sys_date # 乍看之下以為是 character 型別
class(sys_date)
sys_date_chr <- as.character(sys_date)
sys_date_chr
class(sys_date_chr)
sys_date + 1
# sys_date_chr + 1 # Error
mysteric_number <- as.numeric(sys_date) # 神秘的數字
sys_date - mysteric_number
## [1] "2018-09-03"
## [1] "Date"
## [1] "2018-09-03"
## [1] "character"
## [1] "2018-09-04"
## [1] "1970-01-01"
original_date <- "1970-01-01" # character
original_date <- as.Date(original_date) # Date
as.numeric(original_date)
original_date + 1
original_date - 1
## [1] 0
## [1] "1970-01-02"
## [1] "1969-12-31"

日期練習:計算 Beyond 成立幾週年

香港搖滾樂隊 Beyond 於 1983 年成立,假設成立日期是 1983-12-31,以系統日期計算今年是 Beyond 成立幾週年?

# Solution 1
beyond_start_date <- as.Date("1983-12-31") # Date
sys_date <- Sys.Date() # Date
days_diff <- as.numeric(sys_date) - as.numeric(beyond_start_date) # numeric
years_diff <- days_diff / 365
ceiling(years_diff)
floor(years_diff)
round(years_diff)
## [1] 35
## [1] 34
## [1] 35
sys_date <- Sys.Date()
sys_date
# Solution 1: extract year part via substring()
sys_year <- substr(sys_date, start = 1, stop = 4) # character
sys_year <- as.numeric(sys_year) # numeric
sys_year - 1983
# Solution 2: extract year part via format()
sys_year <- format(sys_date, format = "%Y") # character
sys_year <- as.numeric(sys_year) # numeric
sys_year - 1983
## [1] "2018-09-03"
## [1] 35
## [1] 35

日期時間

  • Sys.time() 函數可以查看今天的日期時間
  • as.POSIXct() 函數可以將文字轉換為日期時間
  • 日期時間型別是可以運算的
  • strptime() 可以暸解文字對應日期時間的格式
sys_time <- Sys.time()
sys_time
class(sys_time) # POSIXct
mysteric_number <- as.numeric(sys_time) # 神秘數字
sys_time - mysteric_number
original_datetime <- as.POSIXct("1970-01-01 08:00:00")
as.numeric(original_datetime)
original_datetime - 1
original_datetime + 1
## [1] "2018-09-03 16:48:11 CST"
## [1] "POSIXct" "POSIXt" 
## [1] "1970-01-01 08:00:00 CST"
## [1] 0
## [1] "1970-01-01 07:59:59 CST"
## [1] "1970-01-01 08:00:01 CST"

日期時間練習:計算餘震與主震相差的時間(秒)

1999 年 9 月 21 日 1 時 47 分 16 秒發生震央位於南投縣集集鎮,芮氏規模 7.3 的地震;1999 年 9 月 21 日 1 時 57 分 15 秒發生第一個芮氏規模超過 6 的餘震,請計算第一個芮氏規模超過 6 的餘震與主震相差的時間?

major_shock <- as.POSIXct("1999-09-21 01:47:16")
after_shock <- as.POSIXct("1999-09-21 01:57:15")
secs_diff <- as.numeric(after_shock) - as.numeric(major_shock)
secs_diff
sprintf("第一個芮氏規模超過 6 的餘震與主震相差的時間為 %d 秒", secs_diff)
mins_diff <- secs_diff %/% 60
mins_secs_diff <- secs_diff %% 60
mins_diff
mins_secs_diff
sprintf("第一個芮氏規模超過 6 的餘震與主震相差的時間為 %d 分 %d 秒", mins_diff, mins_secs_diff)
## [1] 599
## [1] "第一個芮氏規模超過 6 的餘震與主震相差的時間為 599 秒"
## [1] 9
## [1] 59
## [1] "第一個芮氏規模超過 6 的餘震與主震相差的時間為 9 分 59 秒"

函數

  • 內建函數:多到數不清…
  • 自訂函數的結構
FUNCTION_NAME <- function(x, arg1, arg2, ...) {
  # Function Body
  # Utilizing x, arg1, arg2 to generate OUTPUT
  return(OUTPUT)
}

使用函數的用意?

  • 讓程式結構化
  • 區段 1: 載入套件
  • 區段 2: 定義函數
  • 區段 3: 宣告物件與呼叫函數
# Section 1
library(Pkg_1)
library(Pkg_2)
library(Pkg_3)

# Section 2
FUNCTION_1 <- function() {
  # ...
  return(OUTPUT_1)
}

FUNCTION_2 <- function() {
  # ...
  return(OUTPUT_2)
}

# Section 3
OBJ_1 <- ...
OBJ_2 <- ...
FUNCTION_1(OBJ_1)
FUNCTION_2(OBJ_2)
  • 讓我們可以重複利用程式
get_age <- function(birth_year) {
  sys_date <- Sys.Date()
  sys_year <- format(sys_date, format = "%Y")
  sys_year <- as.numeric(sys_year)
  age <- sys_year - birth_year
  return(age)
}
get_age(1983) # 35 Beyond
get_age(1999) # 19 MayDay
get_age(1960) # 58 Beatles
## [1] 35
## [1] 19
## [1] 58
# fizz-buzz
fizz_buzz <- function(x) {
  if (x %% 6 == 0) {
    sprintf("%d 既是 2 的倍數也是 3 的倍數", x)
  } else if (x %% 3 == 0) {
    sprintf("%d 是 3 的倍數", x)
  } else if (x %% 2 == 0) {
    sprintf("%d 是 2 的倍數", x)
  } else {
    sprintf("%d 既不是 2 的倍數也不是 3 的倍數", x)
  }
}
fizz_buzz(9)
fizz_buzz(10)
fizz_buzz(11)
fizz_buzz(12)
## [1] "9 是 3 的倍數"
## [1] "10 是 2 的倍數"
## [1] "11 既不是 2 的倍數也不是 3 的倍數"
## [1] "12 既是 2 的倍數也是 3 的倍數"

資料結構

  • 向量(vector)
  • 清單(list)
  • 資料框(data.frame)
  • (選修)因素向量(factor)
  • (選修)矩陣(matrix)
  • (選修)陣列(array)

向量(vector)

  • 脫離純量(Scalar)走向資料結構的第一站
  • 最常見產生向量的方式:c()
help(c) # ?c
my_lucky_number <- 24
my_lucky_numbers <- c(24, 34, 18, 87, 78)
my_lucky_numbers
fav_super_heroes <- c("Ironman", "Superman", "Batman", "Spiderman", "Wonder Woman")
fav_super_heroes
## [1] 24 34 18 87 78
## [1] "Ironman"      "Superman"     "Batman"       "Spiderman"   
## [5] "Wonder Woman"

利用中括號 [] 選擇向量中的元素

  • 向量的索引從 1 開始
  • 選出第一個幸運數字 my_lucky_numbers[1]
  • 選出第一個超級英雄 fav_super_heroes[1]
  • Indexing(選一個元素)
my_lucky_numbers[1] # 24
fav_super_heroes[1] # "Ironman"
## [1] 24
## [1] "Ironman"

在中括號裡面加入 c() 選擇多個元素

  • 支援多個元素的選取
  • Slicing(選多個元素)
my_lucky_numbers[c(1, 4)] # 24 87
marvel_heroes <- fav_super_heroes[c(1, 4)] # Ironman Spiderman
dc_heroes <- fav_super_heroes[c(2, 3, 5)] # Superman Batman Wonder Woman
marvel_heroes
dc_heroes
## [1] 24 87
## [1] "Ironman"   "Spiderman"
## [1] "Superman"     "Batman"       "Wonder Woman"

中括號裡面放入負數

  • 負面表列
my_lucky_numbers[-1] # 34 18 87 78
my_lucky_numbers[c(-1, -5)] # 34 18 87
marvel_heroes <- fav_super_heroes[c(1, 4)] # Ironman Spiderman Wonder Woman
dc_heroes <- fav_super_heroes[c(-1, -4)] # Superman Batman
marvel_heroes
dc_heroes
## [1] 34 18 87 78
## [1] 34 18 87
## [1] "Ironman"   "Spiderman"
## [1] "Superman"     "Batman"       "Wonder Woman"

利用長度選擇向量中的最後一個元素

  • length() 函數可以回傳向量的長度
help(length) # ?length
length(my_lucky_numbers) # 5
length(dc_heroes) # 3
length(marvel_heroes) # 2
my_lucky_numbers[length(my_lucky_numbers)] # 78
dc_heroes[length(dc_heroes)] # Wonder Woman
marvel_heroes[length(marvel_heroes)] # Spiderman
## [1] 5
## [1] 3
## [1] 2
## [1] 78
## [1] "Wonder Woman"
## [1] "Spiderman"

使用判斷條件產生邏輯值來選擇元素

  • 選出是奇數的幸運數字
  • 向量是一個運算的基本單位
my_lucky_numbers[4] # 87
my_lucky_numbers[c(-1, -2, -3, -5)] # 87
my_lucky_numbers[my_lucky_numbers %% 2 != 0] # 87
## [1] 87
## [1] 87
## [1] 87
  • 選出鋼鐵人
fav_super_heroes[1] # Ironman
fav_super_heroes[c(-2, -3, -4, -5)] # Ironman
fav_super_heroes[fav_super_heroes == "Ironman"] # Ironman
## [1] "Ironman"
## [1] "Ironman"
## [1] "Ironman"

練習選擇元素

  • 利用 slicing 與判斷條件選出存活的復仇者們
  • 運用 %in% 運算符號
avengers <- c("Iron Man", "Thor", "Hulk", "Captain America", "Black Widow", "War Machine", "Doctor Strange", "Spider-Man", "Black Panther", "Gamora")
# Slicing
survived_avengers <- avengers[c(-7, -8, -9, -10)]
depreciated_avengers <- avengers[7:10]
survived_avengers
depreciated_avengers
# 判斷條件
depreciated <- c("Doctor Strange", "Spider-Man", "Black Panther", "Gamora")
depreciated_avengers <- avengers[avengers %in% depreciated]
depreciated_avengers
survived_avengers <- avengers[!(avengers %in% depreciated)]
survived_avengers
## [1] "Iron Man"        "Thor"            "Hulk"            "Captain America"
## [5] "Black Widow"     "War Machine"    
## [1] "Doctor Strange" "Spider-Man"     "Black Panther"  "Gamora"        
## [1] "Doctor Strange" "Spider-Man"     "Black Panther"  "Gamora"        
## [1] "Iron Man"        "Thor"            "Hulk"            "Captain America"
## [5] "Black Widow"     "War Machine"
  • 算 BMI 然後把大於 25 的拿出來
set.seed(123)
heights <- sample(150:180, size = 100, replace = TRUE)
weights <- sample(50:80, size = 100, replace = TRUE)
bmis <- weights / (heights/100)**2
bmis[bmis > 25] # bmis > 25
writeLines("\n")
bmis[bmis <= 25] # bmis <= 25
writeLines("\n")
bmis[!(bmis > 25)] # bmis <= 25
##  [1] 27.23922 25.21625 33.77045 28.30600 28.32658 30.75740 27.18163
##  [8] 34.64760 26.56250 27.35885 29.93759 29.33333 26.02617 25.05931
## [15] 30.40763 31.64432 25.55885 32.04588 33.33187 25.47666 28.62147
## [22] 25.96953 27.00513 25.68956 32.44444 26.02617 28.35306 30.07812
## [29] 29.04866 30.83289 27.05380 29.04866 32.04995 31.25000 31.24876
## [36] 26.63892 30.29778
## 
## 
##  [1] 19.81768 24.76757 19.97441 21.70513 22.23099 20.07733 24.34381
##  [8] 21.93635 17.44126 17.23643 19.35021 22.34352 20.17715 20.65754
## [15] 17.28395 21.10727 19.59646 18.42404 21.63115 21.53491 24.57787
## [22] 23.93899 23.78121 24.87772 23.62445 19.05197 21.21832 24.03441
## [29] 23.12467 19.33373 22.83288 23.24380 24.46460 20.47827 24.34961
## [36] 19.00391 18.71095 20.10916 21.60410 22.49135 24.43519 19.91837
## [43] 24.84098 23.51020 18.28571 20.47827 21.82995 23.05456 19.25703
## [50] 23.53304 23.38435 20.41522 19.15709 24.08822 17.90123 21.38594
## [57] 24.89706 24.45606 20.41522 23.87543 20.47827 20.07733 24.60973
## 
## 
##  [1] 19.81768 24.76757 19.97441 21.70513 22.23099 20.07733 24.34381
##  [8] 21.93635 17.44126 17.23643 19.35021 22.34352 20.17715 20.65754
## [15] 17.28395 21.10727 19.59646 18.42404 21.63115 21.53491 24.57787
## [22] 23.93899 23.78121 24.87772 23.62445 19.05197 21.21832 24.03441
## [29] 23.12467 19.33373 22.83288 23.24380 24.46460 20.47827 24.34961
## [36] 19.00391 18.71095 20.10916 21.60410 22.49135 24.43519 19.91837
## [43] 24.84098 23.51020 18.28571 20.47827 21.82995 23.05456 19.25703
## [50] 23.53304 23.38435 20.41522 19.15709 24.08822 17.90123 21.38594
## [57] 24.89706 24.45606 20.41522 23.87543 20.47827 20.07733 24.60973
which(bmis > 25) # > 25 的位置
which(bmis <= 25) # <= 25 的位置
writeLines("")
heights[which(bmis > 25)] # bmi > 25 的身高
weights[which(bmis > 25)] # bmi > 25 的體重
##  [1]  1  4  6  7 14 15 17 18 19 26 30 35 36 37 38 39 40 45 51 56 57 62 63
## [24] 73 74 75 76 79 80 81 83 85 90 93 95 96 98
##  [1]   2   3   5   8   9  10  11  12  13  16  20  21  22  23  24  25  27
## [18]  28  29  31  32  33  34  41  42  43  44  46  47  48  49  50  52  53
## [35]  54  55  58  59  60  61  64  65  66  67  68  69  70  71  72  77  78
## [52]  82  84  86  87  88  89  91  92  94  97  99 100
## 
##  [1] 158 177 151 166 167 153 157 151 160 171 154 150 164 173 156 159 157
## [18] 154 151 156 153 152 161 172 150 164 156 160 153 157 162 153 155 160
## [35] 159 155 152
##  [1] 68 79 77 78 79 72 67 79 68 80 71 66 70 75 74 80 63 76 76 62 67 60 70
## [24] 76 73 70 69 77 68 76 71 68 77 80 79 64 70

產生數列向量的函數

  • seq()
seq(from = 11, to = 20) # 11:20
11:20
seq(from = 11, to = 21, by = 2)
seq(from = 11, to = 21, length.out = 6)
##  [1] 11 12 13 14 15 16 17 18 19 20
##  [1] 11 12 13 14 15 16 17 18 19 20
## [1] 11 13 15 17 19 21
## [1] 11 13 15 17 19 21

產生向量的其他函數

  • rep() 重複元素的向量
rep("Iron Man", times = 5)
rep(7, times = 3)
## [1] "Iron Man" "Iron Man" "Iron Man" "Iron Man" "Iron Man"
## [1] 7 7 7

向量只能容許單一種變數型別

mixed_types <- c(24, "Kobe Bryant")
class(mixed_types) # character
mixed_types <- c(24, TRUE)
class(mixed_types) # numeric
mixed_types <- c(24, "Kobe Bryant", TRUE)
class(mixed_types) # character
## [1] "character"
## [1] "numeric"
## [1] "character"

元素級別運算 element-wise operations

  • 每個函數都是針對向量而設計的
  • toupper() 文字函數將文字變成大寫
  • abs() 數字函數將數字變成絕對值
toupper("Kobe Bryant") # KOBE BRYANT
fav_players <- c("Steve Nash", "Paul Pierce", "Scottie Pippen", "Kevin Garnett", "Shaquille O'Neal")
toupper(fav_players)
abs(-87)
random_numbers <- sample(-100:-1, size = 5)
random_numbers
abs(random_numbers)
## [1] "KOBE BRYANT"
## [1] "STEVE NASH"       "PAUL PIERCE"      "SCOTTIE PIPPEN"  
## [4] "KEVIN GARNETT"    "SHAQUILLE O'NEAL"
## [1] 87
## [1] -77  -5 -42 -51 -62
## [1] 77  5 42 51 62

清單(list)

  • 使用 list() 函數建立
fav_players <- c("Steve Nash", "Paul Pierce", "Scottie Pippen", "Kevin Garnett", "Shaquille O'Neal")
jersey_numbers <- c(13, 34, 33, 21, 34)
player_list <- list(
  fav_players,
  jersey_numbers
)
player_list
## [[1]]
## [1] "Steve Nash"       "Paul Pierce"      "Scottie Pippen"  
## [4] "Kevin Garnett"    "Shaquille O'Neal"
## 
## [[2]]
## [1] 13 34 33 21 34

使用雙層中括號取出清單中的元素

player_list[[1]] # players
player_list[[2]] # jerseys
## [1] "Steve Nash"       "Paul Pierce"      "Scottie Pippen"  
## [4] "Kevin Garnett"    "Shaquille O'Neal"
## [1] 13 34 33 21 34

清單多半巢狀

  • 像剝洋蔥一樣,一層一層往下選擇
player_list[[1]][1] # Steve Nash
player_list[[2]][1] # 13
## [1] "Steve Nash"
## [1] 13

清單中元素多為不同長度的資料

rating <- 8.6 # length: 1
movie_length <- "2h 29min" # length: 1
genre <- c("Action", "Adventure", "Fantasy") # length: 3
cast <- c("Robert Downey Jr.", "Chris Hemsworth", "Mark Ruffalo", "Chris Evans", "Scarlett Johansson") # length: 5
avengers_movie <- list(
  rating,
  movie_length,
  genre,
  cast
)
avengers_movie
## [[1]]
## [1] 8.6
## 
## [[2]]
## [1] "2h 29min"
## 
## [[3]]
## [1] "Action"    "Adventure" "Fantasy"  
## 
## [[4]]
## [1] "Robert Downey Jr."  "Chris Hemsworth"    "Mark Ruffalo"      
## [4] "Chris Evans"        "Scarlett Johansson"

清單經常會為元素命名

avengers_movie <- list(
  movieRating = rating,
  movieLength = movie_length,
  movieGenre = genre,
  movieCast = cast
)
avengers_movie
## $movieRating
## [1] 8.6
## 
## $movieLength
## [1] "2h 29min"
## 
## $movieGenre
## [1] "Action"    "Adventure" "Fantasy"  
## 
## $movieCast
## [1] "Robert Downey Jr."  "Chris Hemsworth"    "Mark Ruffalo"      
## [4] "Chris Evans"        "Scarlett Johansson"

透過名稱選擇清單中的元素

  • LIST[[“ELEMENT_NAME”]],傳字串
  • LIST$ELEMENT_NAME,傳物件
avengers_movie[["movieRating"]] # 8.6
avengers_movie$movieRating # 8.6
## [1] 8.6
## [1] 8.6

練習宣告一個清單

# 六人行的影集清單
genre <- "Sitcom"
starring <- c("Jennifer Aniston", "Courteney Cox", "Lisa Kudrow", "Matt LeBlanc", "Matthew Perry", "David Schwimmer")
seasons <- 10

friends_list <- list(
  genre = genre,
  starring = starring,
  seasons = seasons
)
friends_list[["genre"]]
friends_list[["starring"]][5] # "Matthew Perry"

friends_list[["starring"]][friends_list[["starring"]] == "Matthew Perry"]
friends_list$starring[friends_list$starring == "Matthew Perry"]
## [1] "Sitcom"
## [1] "Matthew Perry"
## [1] "Matthew Perry"
## [1] "Matthew Perry"

資料框(data.frame)

  • 類似 Excel 試算表
  • 它具備了列索引欄標籤
  • 用來儲存不同型別、相同長度的資料
fav_players <- c("Steve Nash", "Paul Pierce", "Scottie Pippen", "Kevin Garnett", "Shaquille O'Neal") # character
jersey_numbers <- c(13, 34, 33, 21, 34) # numeric
has_champion <- c(FALSE, rep(TRUE, times=4))
players_df <- data.frame(
  fav_players,
  jersey_numbers,
  has_champion
)
players_df # 在 console 顯示出 data.frame
knitr::kable(players_df)
##        fav_players jersey_numbers has_champion
## 1       Steve Nash             13        FALSE
## 2      Paul Pierce             34         TRUE
## 3   Scottie Pippen             33         TRUE
## 4    Kevin Garnett             21         TRUE
## 5 Shaquille O'Neal             34         TRUE
fav_players jersey_numbers has_champion
Steve Nash 13 FALSE
Paul Pierce 34 TRUE
Scottie Pippen 33 TRUE
Kevin Garnett 21 TRUE
Shaquille O’Neal 34 TRUE

常用的 data.frame 函數

  • dim() dimension 的縮寫,觀察資料框有幾個列、幾個欄
  • nrow()ncol() 把 dimension 分開回傳
  • head()tail() 把資料框的前六或後六觀測值回傳
  • str() structure 的縮寫,回傳複合性資訊
  • summary() 回傳數值資料的描述性統計
phx_suns_0405 <- read.csv("~/Desktop/phx_0405.csv")

dim(phx_suns_0405) # 18 x 9 的資料外觀
nrow(phx_suns_0405) # 18
ncol(phx_suns_0405) # 9
View(phx_suns_0405)
knitr::kable(phx_suns_0405)
head(phx_suns_0405)
tail(phx_suns_0405)
str(phx_suns_0405)
summary(phx_suns_0405)
## [1] 18  9
## [1] 18
## [1] 9
No. Player Pos Ht Wt Birth.Date X Exp College
10 Leandro Barbosa01 PG 6-3 194 November 28 1982 br 1
11 Zarko Cabarkapa01 PF 6-11 235 May 21 1981 rs 1
45 Steven Hunter01 C 7-0 220 October 31 1981 us 3 DePaul University
21 Jim Jackson01 SF 6-6 220 October 14 1970 us 12 Ohio State University
23 Casey Jacobsen01 SF 6-6 215 March 19 1981 us 2 Stanford University
2 Joe Johnson02 SF 6-7 240 June 29 1981 us 3 University of Arkansas
30 Maciej Lampe01 C 6-11 275 February 5 1985 pl 1
31 Shawn Marion01 PF 6-7 220 May 7 1978 us 5 University of Nevada Las Vegas
0 Walter McCarty01 PF 6-10 230 February 1 1974 us 8 University of Kentucky
13 Steve Nash01 PG 6-3 195 February 7 1974 za 8 Santa Clara University
46 Bo Outlaw01 PF 6-8 210 April 13 1971 us 11 South Plains College University of Houston
1 Smush Parker01 PG 6-4 190 June 1 1981 us 1 Fordham University
3 Quentin Richardson01 SG 6-6 223 April 13 1980 us 4 DePaul University
17 Paul Shirley01 PF 6-10 230 December 23 1977 us 2 Iowa State University
32 Amar’e Stoudemire01 C 6-10 245 November 16 1982 us 2
1 Yuta Tabuse01 PG 5-9 165 October 5 1980 jp R Brigham Young University Hawaii
43 Jake Voskuhl01 C 6-11 245 November 1 1977 us 4 University of Connecticut
4 Jackson Vroman01 PF 6-10 220 June 6 1981 us R Iowa State University
##   No.                     Player Pos   Ht  Wt       Birth.Date  X Exp
## 1  10 Leandro Barbosa\\barbole01  PG  6-3 194 November 28 1982 br   1
## 2  11 Zarko Cabarkapa\\cabarza01  PF 6-11 235      May 21 1981 rs   1
## 3  45   Steven Hunter\\huntest01   C  7-0 220  October 31 1981 us   3
## 4  21     Jim Jackson\\jacksji01  SF  6-6 220  October 14 1970 us  12
## 5  23  Casey Jacobsen\\jacobca01  SF  6-6 215    March 19 1981 us   2
## 6   2     Joe Johnson\\johnsjo02  SF  6-7 240     June 29 1981 us   3
##                  College
## 1                       
## 2                       
## 3      DePaul University
## 4  Ohio State University
## 5    Stanford University
## 6 University of Arkansas
##    No.                        Player Pos   Ht  Wt       Birth.Date  X Exp
## 13   3 Quentin Richardson\\richaqu01  SG  6-6 223    April 13 1980 us   4
## 14  17       Paul Shirley\\shirlpa01  PF 6-10 230 December 23 1977 us   2
## 15  32  Amar'e Stoudemire\\stoudam01   C 6-10 245 November 16 1982 us   2
## 16   1        Yuta Tabuse\\tabusyu01  PG  5-9 165   October 5 1980 jp   R
## 17  43       Jake Voskuhl\\voskuja01   C 6-11 245  November 1 1977 us   4
## 18   4     Jackson Vroman\\vromaja01  PF 6-10 220      June 6 1981 us   R
##                            College
## 13               DePaul University
## 14           Iowa State University
## 15                                
## 16 Brigham Young University Hawaii
## 17       University of Connecticut
## 18           Iowa State University
## 'data.frame':    18 obs. of  9 variables:
##  $ No.       : int  10 11 45 21 23 2 30 31 0 13 ...
##  $ Player    : Factor w/ 18 levels "Amar'e Stoudemire\\stoudam01",..: 8 18 15 6 3 7 9 12 16 14 ...
##  $ Pos       : Factor w/ 5 levels "C","PF","PG",..: 3 2 1 4 4 4 1 2 2 3 ...
##  $ Ht        : Factor w/ 9 levels "5-9","6-10","6-11",..: 4 3 9 6 6 7 3 7 2 4 ...
##  $ Wt        : int  194 235 220 220 215 240 275 220 230 195 ...
##  $ Birth.Date: Factor w/ 18 levels "April 13 1971",..: 15 11 17 16 10 8 5 12 4 6 ...
##  $ X         : Factor w/ 6 levels "br","jp","pl",..: 1 4 5 5 5 5 3 5 5 6 ...
##  $ Exp       : Factor w/ 9 levels "1","11","12",..: 1 1 5 3 4 5 1 7 8 8 ...
##  $ College   : Factor w/ 13 levels "","Brigham Young University Hawaii",..: 1 1 3 6 9 10 1 13 12 7 ...
##       No.                                Player    Pos          Ht   
##  Min.   : 0.00   Amar'e Stoudemire\\stoudam01: 1   C :4   6-10   :4  
##  1st Qu.: 3.25   Bo Outlaw\\outlabo01        : 1   PF:6   6-11   :3  
##  Median :15.00   Casey Jacobsen\\jacobca01   : 1   PG:4   6-6    :3  
##  Mean   :18.50   Jackson Vroman\\vromaja01   : 1   SF:3   6-3    :2  
##  3rd Qu.:30.75   Jake Voskuhl\\voskuja01     : 1   SG:1   6-7    :2  
##  Max.   :46.00   Jim Jackson\\jacksji01      : 1          5-9    :1  
##                  (Other)                     :12          (Other):3  
##        Wt                   Birth.Date  X           Exp   
##  Min.   :165.0   April 13 1971   : 1   br: 1   1      :4  
##  1st Qu.:211.2   April 13 1980   : 1   jp: 1   2      :3  
##  Median :220.0   December 23 1977: 1   pl: 1   3      :2  
##  Mean   :220.7   February 1 1974 : 1   rs: 1   4      :2  
##  3rd Qu.:233.8   February 5 1985 : 1   us:13   8      :2  
##  Max.   :275.0   February 7 1974 : 1   za: 1   R      :2  
##                  (Other)         :12           (Other):3  
##                             College 
##                                 :4  
##  DePaul University              :2  
##  Iowa State University          :2  
##  Brigham Young University Hawaii:1  
##  Fordham University             :1  
##  Ohio State University          :1  
##  (Other)                        :7

怎麼拆解 data.frame

  • 選擇一個資料(scalar)
phx_suns_0405[16, "Player"] # Yuta Tabuse
phx_suns_0405[16, 2] # Yuta Tabuse
phx_suns_0405[ , "Player"][16] # Yuta Tabuse
phx_suns_0405$Player[16] # Yuta Tabuse
## [1] Yuta Tabuse\\tabusyu01
## 18 Levels: Amar'e Stoudemire\\stoudam01 ... Zarko Cabarkapa\\cabarza01
## [1] Yuta Tabuse\\tabusyu01
## 18 Levels: Amar'e Stoudemire\\stoudam01 ... Zarko Cabarkapa\\cabarza01
## [1] Yuta Tabuse\\tabusyu01
## 18 Levels: Amar'e Stoudemire\\stoudam01 ... Zarko Cabarkapa\\cabarza01
## [1] Yuta Tabuse\\tabusyu01
## 18 Levels: Amar'e Stoudemire\\stoudam01 ... Zarko Cabarkapa\\cabarza01
  • 選擇一個觀測值(row, observation)
phx_suns_0405[16, ] # Yuta Tabuse 的所有資訊
phx_suns_0405[10, ] # Steve Nash 的所有資訊
##    No.                 Player Pos  Ht  Wt     Birth.Date  X Exp
## 16   1 Yuta Tabuse\\tabusyu01  PG 5-9 165 October 5 1980 jp   R
##                            College
## 16 Brigham Young University Hawaii
##    No.               Player Pos  Ht  Wt      Birth.Date  X Exp
## 10  13 Steve Nash\\nashst01  PG 6-3 195 February 7 1974 za   8
##                   College
## 10 Santa Clara University
  • 選擇一個變數(column, variable)
phx_suns_0405[ , "Player"]
phx_suns_0405$Player
##  [1] Leandro Barbosa\\barbole01    Zarko Cabarkapa\\cabarza01   
##  [3] Steven Hunter\\huntest01      Jim Jackson\\jacksji01       
##  [5] Casey Jacobsen\\jacobca01     Joe Johnson\\johnsjo02       
##  [7] Maciej Lampe\\lampema01       Shawn Marion\\mariosh01      
##  [9] Walter McCarty\\mccarwa01     Steve Nash\\nashst01         
## [11] Bo Outlaw\\outlabo01          Smush Parker\\parkesm01      
## [13] Quentin Richardson\\richaqu01 Paul Shirley\\shirlpa01      
## [15] Amar'e Stoudemire\\stoudam01  Yuta Tabuse\\tabusyu01       
## [17] Jake Voskuhl\\voskuja01       Jackson Vroman\\vromaja01    
## 18 Levels: Amar'e Stoudemire\\stoudam01 ... Zarko Cabarkapa\\cabarza01
##  [1] Leandro Barbosa\\barbole01    Zarko Cabarkapa\\cabarza01   
##  [3] Steven Hunter\\huntest01      Jim Jackson\\jacksji01       
##  [5] Casey Jacobsen\\jacobca01     Joe Johnson\\johnsjo02       
##  [7] Maciej Lampe\\lampema01       Shawn Marion\\mariosh01      
##  [9] Walter McCarty\\mccarwa01     Steve Nash\\nashst01         
## [11] Bo Outlaw\\outlabo01          Smush Parker\\parkesm01      
## [13] Quentin Richardson\\richaqu01 Paul Shirley\\shirlpa01      
## [15] Amar'e Stoudemire\\stoudam01  Yuta Tabuse\\tabusyu01       
## [17] Jake Voskuhl\\voskuja01       Jackson Vroman\\vromaja01    
## 18 Levels: Amar'e Stoudemire\\stoudam01 ... Zarko Cabarkapa\\cabarza01

練習做一個資料框

starring <- c("Jennifer Aniston", "Courteney Cox", "Lisa Kudrow",
              "Matt LeBlanc", "Matthew Perry", "David Schwimmer") # 演員的名字
cast <- c("Rachel Green", "Monica Geller", "Phoebe Buffay",
          "Joey Tribianni", "Chandler Bing", "Ross Geller") # 劇中的名字
friends_df <- data.frame(
  starring,
  cast
)
friends_df # View(friends_df)
##           starring           cast
## 1 Jennifer Aniston   Rachel Green
## 2    Courteney Cox  Monica Geller
## 3      Lisa Kudrow  Phoebe Buffay
## 4     Matt LeBlanc Joey Tribianni
## 5    Matthew Perry  Chandler Bing
## 6  David Schwimmer    Ross Geller

如何使用條件選擇觀測值

  • 把中鋒選出來
phx_suns_0405[c(3, 7, 15, 17), ] # 用索引
phx_suns_0405[phx_suns_0405$Pos == "C", ] # 用條件
##    No.                       Player Pos   Ht  Wt       Birth.Date  X Exp
## 3   45     Steven Hunter\\huntest01   C  7-0 220  October 31 1981 us   3
## 7   30      Maciej Lampe\\lampema01   C 6-11 275  February 5 1985 pl   1
## 15  32 Amar'e Stoudemire\\stoudam01   C 6-10 245 November 16 1982 us   2
## 17  43      Jake Voskuhl\\voskuja01   C 6-11 245  November 1 1977 us   4
##                      College
## 3          DePaul University
## 7                           
## 15                          
## 17 University of Connecticut
##    No.                       Player Pos   Ht  Wt       Birth.Date  X Exp
## 3   45     Steven Hunter\\huntest01   C  7-0 220  October 31 1981 us   3
## 7   30      Maciej Lampe\\lampema01   C 6-11 275  February 5 1985 pl   1
## 15  32 Amar'e Stoudemire\\stoudam01   C 6-10 245 November 16 1982 us   2
## 17  43      Jake Voskuhl\\voskuja01   C 6-11 245  November 1 1977 us   4
##                      College
## 3          DePaul University
## 7                           
## 15                          
## 17 University of Connecticut
  • 把控球後衛(PG)、小前鋒(SF)選出來
  • | 連結
  • %in% 連結
phx_suns_0405$Pos == "PG" & phx_suns_0405$Pos == "SF"
phx_suns_0405[phx_suns_0405$Pos == "PG" | phx_suns_0405$Pos == "SF", ]
phx_suns_0405[phx_suns_0405$Pos %in% c("PG", "SF"), c("Pos", "Player")]
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##    No.                     Player Pos  Ht  Wt       Birth.Date  X Exp
## 1   10 Leandro Barbosa\\barbole01  PG 6-3 194 November 28 1982 br   1
## 4   21     Jim Jackson\\jacksji01  SF 6-6 220  October 14 1970 us  12
## 5   23  Casey Jacobsen\\jacobca01  SF 6-6 215    March 19 1981 us   2
## 6    2     Joe Johnson\\johnsjo02  SF 6-7 240     June 29 1981 us   3
## 10  13       Steve Nash\\nashst01  PG 6-3 195  February 7 1974 za   8
## 12   1    Smush Parker\\parkesm01  PG 6-4 190      June 1 1981 us   1
## 16   1     Yuta Tabuse\\tabusyu01  PG 5-9 165   October 5 1980 jp   R
##                            College
## 1                                 
## 4            Ohio State University
## 5              Stanford University
## 6           University of Arkansas
## 10          Santa Clara University
## 12              Fordham University
## 16 Brigham Young University Hawaii
##    Pos                     Player
## 1   PG Leandro Barbosa\\barbole01
## 4   SF     Jim Jackson\\jacksji01
## 5   SF  Casey Jacobsen\\jacobca01
## 6   SF     Joe Johnson\\johnsjo02
## 10  PG       Steve Nash\\nashst01
## 12  PG    Smush Parker\\parkesm01
## 16  PG     Yuta Tabuse\\tabusyu01

練習使用條件選擇資料

  • 將介於 190 磅與 250 磅之間的球員選出來
  • 將未滿 190 磅與超過 250 磅之間的球員選出來
phx_suns_0405[phx_suns_0405$Wt >= 190 & phx_suns_0405$Wt <= 250, ]
phx_suns_0405[phx_suns_0405$Wt < 190 | phx_suns_0405$Wt > 250, ]
phx_suns_0405[!(phx_suns_0405$Wt >= 190 & phx_suns_0405$Wt <= 250), ]
##    No.                        Player Pos   Ht  Wt       Birth.Date  X Exp
## 1   10    Leandro Barbosa\\barbole01  PG  6-3 194 November 28 1982 br   1
## 2   11    Zarko Cabarkapa\\cabarza01  PF 6-11 235      May 21 1981 rs   1
## 3   45      Steven Hunter\\huntest01   C  7-0 220  October 31 1981 us   3
## 4   21        Jim Jackson\\jacksji01  SF  6-6 220  October 14 1970 us  12
## 5   23     Casey Jacobsen\\jacobca01  SF  6-6 215    March 19 1981 us   2
## 6    2        Joe Johnson\\johnsjo02  SF  6-7 240     June 29 1981 us   3
## 8   31       Shawn Marion\\mariosh01  PF  6-7 220       May 7 1978 us   5
## 9    0     Walter McCarty\\mccarwa01  PF 6-10 230  February 1 1974 us   8
## 10  13          Steve Nash\\nashst01  PG  6-3 195  February 7 1974 za   8
## 11  46          Bo Outlaw\\outlabo01  PF  6-8 210    April 13 1971 us  11
## 12   1       Smush Parker\\parkesm01  PG  6-4 190      June 1 1981 us   1
## 13   3 Quentin Richardson\\richaqu01  SG  6-6 223    April 13 1980 us   4
## 14  17       Paul Shirley\\shirlpa01  PF 6-10 230 December 23 1977 us   2
## 15  32  Amar'e Stoudemire\\stoudam01   C 6-10 245 November 16 1982 us   2
## 17  43       Jake Voskuhl\\voskuja01   C 6-11 245  November 1 1977 us   4
## 18   4     Jackson Vroman\\vromaja01  PF 6-10 220      June 6 1981 us   R
##                                       College
## 1                                            
## 2                                            
## 3                           DePaul University
## 4                       Ohio State University
## 5                         Stanford University
## 6                      University of Arkansas
## 8              University of Nevada Las Vegas
## 9                      University of Kentucky
## 10                     Santa Clara University
## 11 South Plains College University of Houston
## 12                         Fordham University
## 13                          DePaul University
## 14                      Iowa State University
## 15                                           
## 17                  University of Connecticut
## 18                      Iowa State University
##    No.                  Player Pos   Ht  Wt      Birth.Date  X Exp
## 7   30 Maciej Lampe\\lampema01   C 6-11 275 February 5 1985 pl   1
## 16   1  Yuta Tabuse\\tabusyu01  PG  5-9 165  October 5 1980 jp   R
##                            College
## 7                                 
## 16 Brigham Young University Hawaii
##    No.                  Player Pos   Ht  Wt      Birth.Date  X Exp
## 7   30 Maciej Lampe\\lampema01   C 6-11 275 February 5 1985 pl   1
## 16   1  Yuta Tabuse\\tabusyu01  PG  5-9 165  October 5 1980 jp   R
##                            College
## 7                                 
## 16 Brigham Young University Hawaii

使用 subset() 函數來選擇

is_center <- phx_suns_0405$Pos == "C"
is_pg_sf <- phx_suns_0405$Pos %in% c("PG", "SF")
subset(phx_suns_0405, subset = is_center)
subset(phx_suns_0405, subset = is_pg_sf, select = c("Player", "Pos"))
##    No.                       Player Pos   Ht  Wt       Birth.Date  X Exp
## 3   45     Steven Hunter\\huntest01   C  7-0 220  October 31 1981 us   3
## 7   30      Maciej Lampe\\lampema01   C 6-11 275  February 5 1985 pl   1
## 15  32 Amar'e Stoudemire\\stoudam01   C 6-10 245 November 16 1982 us   2
## 17  43      Jake Voskuhl\\voskuja01   C 6-11 245  November 1 1977 us   4
##                      College
## 3          DePaul University
## 7                           
## 15                          
## 17 University of Connecticut
##                        Player Pos
## 1  Leandro Barbosa\\barbole01  PG
## 4      Jim Jackson\\jacksji01  SF
## 5   Casey Jacobsen\\jacobca01  SF
## 6      Joe Johnson\\johnsjo02  SF
## 10       Steve Nash\\nashst01  PG
## 12    Smush Parker\\parkesm01  PG
## 16     Yuta Tabuse\\tabusyu01  PG

(選修)因素向量(factor)

  • 特殊的字元向量(具有 Levels 的資訊)
  • 每一個 Level 後面有一個整數對應
  • Dummy variable
length(phx_suns_0405$Player) # 18
class(phx_suns_0405$Pos)
as.numeric(phx_suns_0405$Pos)
class(as.character(phx_suns_0405$Pos))
as.numeric(as.character(phx_suns_0405$Pos))
## Warning: NAs introduced by coercion
## [1] 18
## [1] "factor"
##  [1] 3 2 1 4 4 4 1 2 2 3 2 3 5 2 1 3 1 2
## [1] "character"
##  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
  • 不能夠很簡單地新增文字
class(phx_suns_0405$Player)
new_roster <- c(as.character(phx_suns_0405$Player), "Raja Bell")
factor(new_roster)
## [1] "factor"
##  [1] Leandro Barbosa\\barbole01    Zarko Cabarkapa\\cabarza01   
##  [3] Steven Hunter\\huntest01      Jim Jackson\\jacksji01       
##  [5] Casey Jacobsen\\jacobca01     Joe Johnson\\johnsjo02       
##  [7] Maciej Lampe\\lampema01       Shawn Marion\\mariosh01      
##  [9] Walter McCarty\\mccarwa01     Steve Nash\\nashst01         
## [11] Bo Outlaw\\outlabo01          Smush Parker\\parkesm01      
## [13] Quentin Richardson\\richaqu01 Paul Shirley\\shirlpa01      
## [15] Amar'e Stoudemire\\stoudam01  Yuta Tabuse\\tabusyu01       
## [17] Jake Voskuhl\\voskuja01       Jackson Vroman\\vromaja01    
## [19] Raja Bell                    
## 19 Levels: Amar'e Stoudemire\\stoudam01 ... Zarko Cabarkapa\\cabarza01
  • Levels 預設用字母順序
temperatures <- c("Cold", "Cool", "Warm", "Hot")
class(temperatures)
temperatures <- factor(temperatures)
class(temperatures)
temperatures
as.integer(temperatures)
temperatures <- factor(temperatures, levels = c("Cold", "Cool", "Warm", "Hot"))
temperatures
as.integer(temperatures)
## [1] "character"
## [1] "factor"
## [1] Cold Cool Warm Hot 
## Levels: Cold Cool Hot Warm
## [1] 1 2 4 3
## [1] Cold Cool Warm Hot 
## Levels: Cold Cool Warm Hot
## [1] 1 2 3 4
  • 資料框預設將文字讀取為 factor
  • 增加 stringsAsFactors = FALSE 可以將文字讀取為 character
phx_suns_0405 <- read.csv("~/Desktop/phx_0405.csv") # C:/Users/user/Desktop/phx_0405.csv
str(phx_suns_0405)
## 'data.frame':    18 obs. of  9 variables:
##  $ No.       : int  10 11 45 21 23 2 30 31 0 13 ...
##  $ Player    : Factor w/ 18 levels "Amar'e Stoudemire\\stoudam01",..: 8 18 15 6 3 7 9 12 16 14 ...
##  $ Pos       : Factor w/ 5 levels "C","PF","PG",..: 3 2 1 4 4 4 1 2 2 3 ...
##  $ Ht        : Factor w/ 9 levels "5-9","6-10","6-11",..: 4 3 9 6 6 7 3 7 2 4 ...
##  $ Wt        : int  194 235 220 220 215 240 275 220 230 195 ...
##  $ Birth.Date: Factor w/ 18 levels "April 13 1971",..: 15 11 17 16 10 8 5 12 4 6 ...
##  $ X         : Factor w/ 6 levels "br","jp","pl",..: 1 4 5 5 5 5 3 5 5 6 ...
##  $ Exp       : Factor w/ 9 levels "1","11","12",..: 1 1 5 3 4 5 1 7 8 8 ...
##  $ College   : Factor w/ 13 levels "","Brigham Young University Hawaii",..: 1 1 3 6 9 10 1 13 12 7 ...
phx_suns_0405 <- read.csv("~/Desktop/phx_0405.csv", stringsAsFactors = FALSE) # C:/Users/user/Desktop/phx_0405.csv
str(phx_suns_0405)
## 'data.frame':    18 obs. of  9 variables:
##  $ No.       : int  10 11 45 21 23 2 30 31 0 13 ...
##  $ Player    : chr  "Leandro Barbosa\\barbole01" "Zarko Cabarkapa\\cabarza01" "Steven Hunter\\huntest01" "Jim Jackson\\jacksji01" ...
##  $ Pos       : chr  "PG" "PF" "C" "SF" ...
##  $ Ht        : chr  "6-3" "6-11" "7-0" "6-6" ...
##  $ Wt        : int  194 235 220 220 215 240 275 220 230 195 ...
##  $ Birth.Date: chr  "November 28 1982" "May 21 1981" "October 31 1981" "October 14 1970" ...
##  $ X         : chr  "br" "rs" "us" "us" ...
##  $ Exp       : chr  "1" "1" "3" "12" ...
##  $ College   : chr  "" "" "DePaul University" "Ohio State University" ...

(選修)矩陣

my_vec <- 11:18
my_vec
my_mat <- matrix(my_vec)
my_mat
matrix(my_vec, nrow = 2)
matrix(my_vec, nrow = 4)
my_mat <- matrix(my_vec, nrow = 4)
## [1] 11 12 13 14 15 16 17 18
##      [,1]
## [1,]   11
## [2,]   12
## [3,]   13
## [4,]   14
## [5,]   15
## [6,]   16
## [7,]   17
## [8,]   18
##      [,1] [,2] [,3] [,4]
## [1,]   11   13   15   17
## [2,]   12   14   16   18
##      [,1] [,2]
## [1,]   11   15
## [2,]   12   16
## [3,]   13   17
## [4,]   14   18
my_mat[3, 2] # 17
my_mat[, 1]
my_mat[2, ]
my_mat[my_mat %% 2 == 0]
## [1] 17
## [1] 11 12 13 14
## [1] 12 16
## [1] 12 14 16 18
  • 矩陣的乘法
my_mat * my_mat
#my_mat %*% my_mat # Error
t(my_mat) %*% my_mat
my_mat %*% t(my_mat)
##      [,1] [,2]
## [1,]  121  225
## [2,]  144  256
## [3,]  169  289
## [4,]  196  324
##      [,1] [,2]
## [1,]  630  830
## [2,]  830 1094
##      [,1] [,2] [,3] [,4]
## [1,]  346  372  398  424
## [2,]  372  400  428  456
## [3,]  398  428  458  488
## [4,]  424  456  488  520

(選修)陣列(array)

  • 陣列是多維的矩陣
my_arr <- array(1:24, c(2, 3, 4))
my_arr
class(my_arr)
## , , 1
## 
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]    7    9   11
## [2,]    8   10   12
## 
## , , 3
## 
##      [,1] [,2] [,3]
## [1,]   13   15   17
## [2,]   14   16   18
## 
## , , 4
## 
##      [,1] [,2] [,3]
## [1,]   19   21   23
## [2,]   20   22   24
## 
## [1] "array"
  • 選擇元素透過中括號與索引值
my_arr[, , 1]
my_arr[, , 2]
my_arr[, , 3]
my_arr[, , 4]
my_arr[1, 3, 4] # 23
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
##      [,1] [,2] [,3]
## [1,]    7    9   11
## [2,]    8   10   12
##      [,1] [,2] [,3]
## [1,]   13   15   17
## [2,]   14   16   18
##      [,1] [,2] [,3]
## [1,]   19   21   23
## [2,]   20   22   24
## [1] 23

tidyverse 生態系

https://www.tidyverse.org/

  • 先專注在 dplyr 與 ggplot2 即可!

dplyr 套件

  • 更新 R & RStudio!
  • 使用 install.packages("dplyr") 安裝
  • 使用 library(dplyr) 載入
install.packages("dplyr")
library(dplyr)

dplyr 套件提供的基本函數

  • select()
  • filter()
  • arrange()
  • mutate()
  • summarise()
  • group_by()

dplyr 函數支援的神秘運算符號 %>%

  • 傳統呼叫函數的方式
# 傳統
my_vec <- 11:20
sum(my_vec)
## [1] 155
# 傳統
my_vec <- -5:5
abs(my_vec)
##  [1] 5 4 3 2 1 0 1 2 3 4 5
  • 神秘運算符號 %>% 呼叫函數的方式
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# %>% 
my_vec <- 11:20
my_vec %>% 
  sum()
## [1] 155
library(dplyr)
# %>% 
my_vec <- -5:5
my_vec %>% 
  abs()
##  [1] 5 4 3 2 1 0 1 2 3 4 5

神秘運算符號 %>% 叫做 pipe 運算子

  • RStudio 快捷鍵是 Ctrl(Command) + Shift + m
  • 為什麼要用這個運算子?
  • 以一個練習為例:
# MayDay 1997 成立
# 請用系統日期中的年份去計算 MayDay 成立幾週年?
ans <- as.numeric(format(Sys.Date(), format = "%Y")) - 1997
ans
## [1] 21
  • 使用 %>% 來 refactor!
library(dplyr)

ans <- Sys.Date() %>% 
  format(format="%Y") %>% 
  as.numeric() %>% 
  `-` (1997) # ` 在數字鍵 1 的左邊 tilt
ans
## [1] 21

gapminder 套件

#install.packages("gapminder")
library(gapminder)

dim(gapminder)
head(gapminder)
## [1] 1704    6
## # A tibble: 6 x 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.

A bit of inspiration…

#install.packages("plotly")
library(plotly)

gapminder %>% 
  plot_ly(x = ~gdpPercap, y = ~lifeExp, color = ~continent,
          size = ~pop, frame = ~year, text = ~country, 
          sizes = c(10, 500)) %>% 
  layout(xaxis = list(
    type = 'log'
  ))

select()

  • 用 data.frame 的外觀顯示選出的變數
  • $[] 選的外觀是向量
gapminder %>% 
  select(country) %>% 
  head()

head(gapminder$country) # head(gapminder[, "country"])
## # A tibble: 6 x 1
##   country    
##   <fct>      
## 1 Afghanistan
## 2 Afghanistan
## 3 Afghanistan
## 4 Afghanistan
## 5 Afghanistan
## 6 Afghanistan
## [1] Afghanistan Afghanistan Afghanistan Afghanistan Afghanistan Afghanistan
## 142 Levels: Afghanistan Albania Algeria Angola Argentina ... Zimbabwe
gapminder %>% 
  select(country, continent)

gapminder[, c("country", "continent")]
## # A tibble: 1,704 x 2
##    country     continent
##    <fct>       <fct>    
##  1 Afghanistan Asia     
##  2 Afghanistan Asia     
##  3 Afghanistan Asia     
##  4 Afghanistan Asia     
##  5 Afghanistan Asia     
##  6 Afghanistan Asia     
##  7 Afghanistan Asia     
##  8 Afghanistan Asia     
##  9 Afghanistan Asia     
## 10 Afghanistan Asia     
## # ... with 1,694 more rows
## # A tibble: 1,704 x 2
##    country     continent
##    <fct>       <fct>    
##  1 Afghanistan Asia     
##  2 Afghanistan Asia     
##  3 Afghanistan Asia     
##  4 Afghanistan Asia     
##  5 Afghanistan Asia     
##  6 Afghanistan Asia     
##  7 Afghanistan Asia     
##  8 Afghanistan Asia     
##  9 Afghanistan Asia     
## 10 Afghanistan Asia     
## # ... with 1,694 more rows

filter()

利用判斷條件選擇觀測值

gapminder %>%
  filter(country == "Taiwan")

gapminder[gapminder$country == "Taiwan", ]
## # A tibble: 12 x 6
##    country continent  year lifeExp      pop gdpPercap
##    <fct>   <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Taiwan  Asia       1952    58.5  8550362     1207.
##  2 Taiwan  Asia       1957    62.4 10164215     1508.
##  3 Taiwan  Asia       1962    65.2 11918938     1823.
##  4 Taiwan  Asia       1967    67.5 13648692     2644.
##  5 Taiwan  Asia       1972    69.4 15226039     4063.
##  6 Taiwan  Asia       1977    70.6 16785196     5597.
##  7 Taiwan  Asia       1982    72.2 18501390     7426.
##  8 Taiwan  Asia       1987    73.4 19757799    11055.
##  9 Taiwan  Asia       1992    74.3 20686918    15216.
## 10 Taiwan  Asia       1997    75.2 21628605    20207.
## 11 Taiwan  Asia       2002    77.0 22454239    23235.
## 12 Taiwan  Asia       2007    78.4 23174294    28718.
## # A tibble: 12 x 6
##    country continent  year lifeExp      pop gdpPercap
##    <fct>   <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Taiwan  Asia       1952    58.5  8550362     1207.
##  2 Taiwan  Asia       1957    62.4 10164215     1508.
##  3 Taiwan  Asia       1962    65.2 11918938     1823.
##  4 Taiwan  Asia       1967    67.5 13648692     2644.
##  5 Taiwan  Asia       1972    69.4 15226039     4063.
##  6 Taiwan  Asia       1977    70.6 16785196     5597.
##  7 Taiwan  Asia       1982    72.2 18501390     7426.
##  8 Taiwan  Asia       1987    73.4 19757799    11055.
##  9 Taiwan  Asia       1992    74.3 20686918    15216.
## 10 Taiwan  Asia       1997    75.2 21628605    20207.
## 11 Taiwan  Asia       2002    77.0 22454239    23235.
## 12 Taiwan  Asia       2007    78.4 23174294    28718.

arrange()

利用變數排序資料框

gapminder %>% 
  filter(year == 2007) %>% 
  arrange(gdpPercap)
## # A tibble: 142 x 6
##    country                  continent  year lifeExp      pop gdpPercap
##    <fct>                    <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Congo, Dem. Rep.         Africa     2007    46.5 64606759      278.
##  2 Liberia                  Africa     2007    45.7  3193942      415.
##  3 Burundi                  Africa     2007    49.6  8390505      430.
##  4 Zimbabwe                 Africa     2007    43.5 12311143      470.
##  5 Guinea-Bissau            Africa     2007    46.4  1472041      579.
##  6 Niger                    Africa     2007    56.9 12894865      620.
##  7 Eritrea                  Africa     2007    58.0  4906585      641.
##  8 Ethiopia                 Africa     2007    52.9 76511887      691.
##  9 Central African Republic Africa     2007    44.7  4369038      706.
## 10 Gambia                   Africa     2007    59.4  1688359      753.
## # ... with 132 more rows
gapminder %>% 
  filter(year == 2007) %>% 
  arrange(desc(gdpPercap))
## # A tibble: 142 x 6
##    country          continent  year lifeExp       pop gdpPercap
##    <fct>            <fct>     <int>   <dbl>     <int>     <dbl>
##  1 Norway           Europe     2007    80.2   4627926    49357.
##  2 Kuwait           Asia       2007    77.6   2505559    47307.
##  3 Singapore        Asia       2007    80.0   4553009    47143.
##  4 United States    Americas   2007    78.2 301139947    42952.
##  5 Ireland          Europe     2007    78.9   4109086    40676.
##  6 Hong Kong, China Asia       2007    82.2   6980412    39725.
##  7 Switzerland      Europe     2007    81.7   7554661    37506.
##  8 Netherlands      Europe     2007    79.8  16570613    36798.
##  9 Canada           Americas   2007    80.7  33390141    36319.
## 10 Iceland          Europe     2007    81.8    301931    36181.
## # ... with 132 more rows

mutate()

  • 新增一個衍生變數

\[GDP Per cap = \frac{GDP}{pop} \\ GDP = GDP Per cap \times pop \]

gapminder %>% 
  mutate(gdp = gdpPercap * pop)
## # A tibble: 1,704 x 7
##    country     continent  year lifeExp      pop gdpPercap          gdp
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>        <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.  6567086330.
##  2 Afghanistan Asia       1957    30.3  9240934      821.  7585448670.
##  3 Afghanistan Asia       1962    32.0 10267083      853.  8758855797.
##  4 Afghanistan Asia       1967    34.0 11537966      836.  9648014150.
##  5 Afghanistan Asia       1972    36.1 13079460      740.  9678553274.
##  6 Afghanistan Asia       1977    38.4 14880372      786. 11697659231.
##  7 Afghanistan Asia       1982    39.9 12881816      978. 12598563401.
##  8 Afghanistan Asia       1987    40.8 13867957      852. 11820990309.
##  9 Afghanistan Asia       1992    41.7 16317921      649. 10595901589.
## 10 Afghanistan Asia       1997    41.8 22227415      635. 14121995875.
## # ... with 1,694 more rows

summarise()

  • 針對變數做摘要
gapminder %>% 
  filter(year == 2007) %>% 
  summarise(median(gdpPercap))
## # A tibble: 1 x 1
##   `median(gdpPercap)`
##                 <dbl>
## 1               6124.

group_by()

gapminder %>% 
  group_by(year) %>% 
  summarise(median(gdpPercap))
## # A tibble: 12 x 2
##     year `median(gdpPercap)`
##    <int>               <dbl>
##  1  1952               1969.
##  2  1957               2173.
##  3  1962               2335.
##  4  1967               2678.
##  5  1972               3339.
##  6  1977               3799.
##  7  1982               4216.
##  8  1987               4280.
##  9  1992               4386.
## 10  1997               4782.
## 11  2002               5320.
## 12  2007               6124.

ggplot2 套件